Skip to content
#

scikit-learn

scikit-learn logo

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

Here are 6,822 public repositories matching this topic...

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Updated Oct 12, 2022
  • Python

A comprehensive list of Deep Learning / Artificial Intelligence and Machine Learning tutorials - rapidly expanding into areas of AI/Deep Learning / Machine Vision / NLP and industry specific areas such as Climate / Energy, Automotives, Retail, Pharma, Medicine, Healthcare, Policy, Ethics and more.

  • Updated Nov 24, 2022
  • Python
igel
Sponsor
mljar-supervised

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

  • Updated Oct 1, 2020
  • Jupyter Notebook
text-analytics-with-python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

  • Updated Dec 25, 2020
  • Jupyter Notebook
Otto

Otto makes machine learning an intuitive, natural language experience. 🏆 Facebook AI Hackathon winner ⭐️ #1 Trending on MadeWithML.com ⭐️ #4 Trending JavaScript Project on GitHub ⭐️ #15 Trending (All Languages) on GitHub

  • Updated Nov 14, 2022
  • JavaScript
Neuraxle

The world's cleanest AutoML library - Do hyperparameter tuning with the right pipeline abstractions to write clean deep learning production pipelines. Let your pipeline steps have hyperparameter spaces. Design steps in your pipeline like components. Compatible with Scikit-Learn, TensorFlow, and most other libraries, frameworks and MLOps environments.

  • Updated Aug 17, 2022
  • Python
Hyperactive

libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.

  • Updated Nov 22, 2022
  • Python

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

  • Updated Nov 29, 2020
  • Jupyter Notebook
explainx

Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code. We are looking for co-authors to take this project forward. Reach out @ ms8909@nyu.edu

  • Updated Sep 16, 2022
  • Jupyter Notebook
abess
skforecast

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

  • Updated Aug 9, 2017
  • Python
imbalanced-ensemble

Concrete-ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools which aims to simplify the use of fully homomorphic encryption (FHE) for data scientists. Particular care was given to the simplicity of our Python package in order to make it usable by any data scientist, even those without prior cryptography knowledge.

  • Updated Nov 28, 2022
  • Python
nlp_workshop_odsc_europe20

Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020. We will leverage machine learning, deep learning and deep transfer learning to learn and solve popular tasks using NLP including NER, Classification, Recommendation \ Information Retrieval, Summarization, Classification, Language Translation, Q&A and Topic Models.

  • Updated Sep 18, 2020
  • Jupyter Notebook

Classifying the physical activities performed by a user based on accelerometer and gyroscope sensor data collected by a smartphone in the user’s pocket. The activities to be classified are: Standing, Sitting, Stairsup, StairsDown, Walking and Cycling.

  • Updated Jun 23, 2017
  • Python
Machine-Learning

Códigos Python com diferentes aplicações como técnicas de machine learning e deep learning, fundamentos de estatística, problemas de regressão de classificação. Os vídeos com as explicações teóricas estão disponíveis no meu canal do YouTube

  • Updated Oct 19, 2020
  • Jupyter Notebook

This was my Master's project where i was involved using a dataset from Wireless Sensor Data Mining Lab (WISDM) to build a machine learning model to predict basic human activities using a smartphone accelerometer, Using Tensorflow framework, recurrent neural nets and multiple stacks of Long-short-term memory units(LSTM) for building a deep network. After the model was trained, it was saved and exported to an android application and the predictions were made using the model and the interface to speak out the results using text-to-speech API.

  • Updated Sep 26, 2017
  • Python

A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.

  • Updated Nov 18, 2022
  • Jupyter Notebook
Stock_Market_Data_Analysis

Scrape, analyze & visualize stock market data for the S&P500 using Python. Build a basic trading strategy using machine learning to assess company performance and determine buy, sell, hold. Read me & instructions available in Spanish. This is a working repo, with plans to expand the project from technical analysis to fundamental analysis.

  • Updated Aug 7, 2020
  • Jupyter Notebook

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

  • Updated Apr 4, 2022
  • Jupyter Notebook
Machine-Learning-with-Scikit-Learn-Python-3.x

In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features. Learning problems fall into a few categories: supervised learning, in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either: classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of a classification problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class. regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight. unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Click here to go to the Scikit-Learn unsupervised learning page).

  • Updated Jun 17, 2021
  • Jupyter Notebook

Formed trajectories of sets of points.Experimented on finding similarities between trajectories based on DTW (Dynamic Time Warping) and LCSS (Longest Common SubSequence) algorithms.Modeled trajectories as strings based on a Grid representation.Benchmarked KNN, Random Forest, Logistic Regression classification algorithms to classify efficiently trajectories.

  • Updated Jun 22, 2022
  • Python

An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP

  • Updated Nov 6, 2020
  • Jupyter Notebook
Cryptocurrency-Prediction-with-Artificial-Intelligence-V3.0-GRU-Neural-Network

Movie Recommendation Chatbot provides information about a movie like plot, genre, revenue, budget, imdb rating, imdb links, etc. The model was trained with Kaggle’s movies metadata dataset. To give a recommendation of similar movies, Cosine Similarity and TFID vectorizer were used. Slack API was used to provide a Front End for the chatbot. IBM Watson was used to link the Python code for Natural Language Processing with the front end hosted on Slack API. Libraries like nltk, sklearn, pandas and nlp were used to perform Natural Language Processing and cater to user queries and responses.

  • Updated Jun 27, 2020
  • Jupyter Notebook

Lectures on "crime and political corruption analysis using data mining, machine learning and complex networks" at the School of Applied Mathematics in the Institute of Mathematics and Computer Science at University of São Paulo

  • Updated Jul 7, 2019
  • Jupyter Notebook

Price Prediction Case Study predicting the Bitcoin price and the Google stock price using Deep Learning, RNN with LSTM layers with TensorFlow and Keras in Python. (Includes: Data, Case Study Paper, Code)

  • Updated Apr 19, 2022
  • Python

This project leverages spotify's api and provided user playlists to create and tune a neural network model that generates song recommendations based off of song data in provided playlists.

  • Updated Jun 24, 2022
  • Jupyter Notebook

CVE-Search (name still in alpha), is a Machine Learning tool focused on the detection of exploits or proofs of concept in social networks such as Twitter, Github. It is also capable of doing related searches on Google, Yandex, DuckDuckGo on CVEs and detecting if the content may be a functional exploit, a proof of concept or simply information about the vulnerability.

  • Updated Jan 6, 2021
  • Jupyter Notebook

In this work, we propose a deterministic version of Local Interpretable Model Agnostic Explanations (LIME) and the experimental results on three different medical datasets shows the superiority for Deterministic Local Interpretable Model-Agnostic Explanations (DLIME).

  • Updated Jun 22, 2022
  • Jupyter Notebook

A Linear Regression model to predict the car prices for the U.S market to help a new entrant understand important pricing variables in the U.S automobile industry. A highly comprehensive analysis with detailed explanation of all steps; data cleaning, exploration, visualization, feature selection, model building, evaluation & MLR assumptions validity.

  • Updated Jul 28, 2020
  • Jupyter Notebook

Machine Learning project a case study focused on the interaction with digital characters, using a character called "Kaio", which, based on the automatic detection of facial expressions and classification of emotions, interacts with humans by classifying emotions and imitating expressions

  • Updated May 18, 2018
  • Jupyter Notebook

🔱 Some recognized algorithms[Decision Tree, Adaboost, Perceptron, Clustering, Neural network etc. ] of machine learning and pattern recognition are implemented from scratch using python. Data sets are also included to test the algorithms.

  • Updated May 9, 2019
  • Python

This project has 3 goals: To find out the best machine learning pipeline for predicting ASD cases using genetic algorithms, via the TPOT library. (Classification Problem) Compare the accuracy of the accuracy of the determined pipeline, with a standard Naive-Bayes classifier. Saving the classifier as an external file, and use this file in a Flask API to make predictions in the cloud.

  • Updated May 28, 2020
  • Rich Text Format

This program consists of clean and polished Graphical User Interface (GUI) that interacts with 8 Machine Learning models and data visualization tools through the use of different Python libraries. The user can interact with the GUI through selecting which model to run on the testing data on, which then takes them to a screen displaying the prediction results of the testing data as well as the general model accuracy. The screen also includes various buttons that, when selected, display complex and attractive data visualizations on the testing data.

  • Updated May 22, 2019
  • Python

Thanks to digitization, we often have access to large databases, consisting of various fields of information, ranging from numbers to texts and even boolean values. Such databases lend themselves especially well to machine learning, classification and big data analysis tasks. We are able to train classifiers, using already existing data and use them for predicting the values of a certain field, given that we have information regarding the other fields. Most specifically, in this study, we look at the Electronic Health Records (EHRs) that are compiled by hospitals. These EHRs are convenient means of accessing data of individual patients, but there processing as a whole still remains a task. However, EHRs that are composed of coherent, well-tabulated structures lend themselves quite well to the application to machine language, via the usage of classifiers. In this study, we look at a Blood Transfusion Service Center Data Set (Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan). We used scikit-learn machine learning in python. From Support Vector Machines(SVM), we use Support Vector Classification(SVC), from the linear model we import Perceptron. We also used the K.neighborsclassifier and the decision tree classifiers. We segmented the database into the 2 parts. Using the first, we trained the classifiers and the next part was used to verify if the classifier prediction matched that of the actual values.

  • Updated Aug 5, 2018
  • Python

Parkinson disease is associated with movement disorder symptoms, such as tremor, rigidity, bradykinesia, and postural instability. The manifestation of bradykinesia and rigidity is often in the early stages of the disease. These have a noticeable effect on the handwriting and sketching abilities of patients, and micrographia has been used for early-stage diagnosis of Parkinson’s disease. While handwriting of a person is influenced by a number of factors such as language proficiency and education, sketching of a shape such as the spiral has been found to be non-invasive and independent measure.

  • Updated Dec 20, 2019
  • Python
Home-Credit-Default-Risk-Recognition

The project provides a complete end-to-end workflow for building a binary classifier in Python to recognize the risk of housing loan default. It includes methods like automated feature engineering for connecting relational databases, comparison of different classifiers on imbalanced data, and hyperparameter tuning using Bayesian optimization.

  • Updated Jul 2, 2020
  • Jupyter Notebook

Fake News Detection System for detecting whether news is fake or not. The model is trained using "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. Link for dataset: https://arxiv.org/abs/1705.00648.

  • Updated Jan 25, 2020
  • Jupyter Notebook

EDA and Machine Learning Models in R and Python (Regression, Classification, Clustering, SVM, Decision Tree, Random Forest, Time-Series Analysis, Recommender System, XGBoost)

  • Updated Jun 2, 2022
  • Jupyter Notebook

Multi-docker container data science / engineering playground (w/ Kafka, Airflow, MLFlow, Tensorflow-Keras / SKLearn) for simulating a microservices-oriented architecture

  • Updated Nov 22, 2022
  • Dockerfile
Vehicle-detection-using-deep-learning-with-Tensorflow-and-Python

This repository will offer several learning strategies and advanced study material along with the interesting use case and programs that will help jump start your journey of becoming a data scientist with python !!!

  • Updated Sep 2, 2019
  • Jupyter Notebook

This is a simple image classification project trained on the top of Keras/Tensorflow API with MobileNetV2 deep neural network architecture having weights considered as pre-trained 'imagenet' weights. The trained model (mask-detector-model.h5) takes the real-time video from webcam as an input and predicts if the face landmarks in Region of Interest (ROI) is 'Mask' or 'No Mask' with real-time on screen accuracy.

  • Updated Nov 22, 2022
  • Jupyter Notebook

This is a simple image classification project trained on the top of Keras/Tensorflow API with MobileNetV2 deep neural network architecture having weights considered as pre-trained 'imagenet' weights. The trained model (mask-detector-model.h5) takes the real-time video from webcam as an input and predicts if the face landmarks in Region of Interest (ROI) is 'Mask' or 'No Mask' with real-time on screen accuracy.

  • Updated Nov 22, 2022
  • Jupyter Notebook

Diego: Data in, IntElliGence Out. A fast framework that supports the rapid construction of automated learning tasks. Simply create an automated learning study (Study) and generate correlated trials (Trial). Then run the code and get a machine learning model. Implemented using Scikit-learn API glossary, using Bayesian optimization and genetic algorithms for automated machine learning. Inspired by [Fast.ai](https://github.com/fastai/fastai).

  • Updated Feb 3, 2021
  • Python

This repository contains all the projects and labs I worked on while pursuing professional certificate programs, specializations, and bootcamp. [Areas: Deep Learning, Machine Learning, Applied Data Science].

  • Updated Oct 13, 2020
  • Jupyter Notebook
DataCamp-Courses--Notes-Exercises-Projects

Implements an entire machine learning pipeline to train and evaluate a Random Forest Classifier on labeled gait data for walking. Data generated during the experiment has led to helpful insights in to the problem domain.

  • Updated Jul 7, 2022
  • Python

Modello Random Forest per la creazione di una mappa di suscettibilità da frane superficiali // // Tesi di Laurea Magistrale in Scienze della Terra (Geologia Applicata) - Università degli Studi di Milano

  • Updated Apr 28, 2021
  • Python

I'm attempting the NYC Taxi Duration prediction Kaggle challenge. I'll by using a combination of Pandas, Matplotlib, and XGBoost as python libraries to help me understand and analyze the taxi dataset that Kaggle provides. The goal will be to build a predictive model for taxi duration time. I'll also be using Google Colab as my jupyter notebook. i will also predict without Google colab on normal system.

  • Updated Sep 21, 2018
  • Jupyter Notebook

Jupyter notebook that outlines the process of creating a machine learning predictive model. Predicts the peak "Wins Shared" by the current draft prospects based on numerous features such as college stats, projected draft pick, physical profile and age. I try out multiple models and pick the best performing one for the data from my judgement.

  • Updated Apr 17, 2018
  • Jupyter Notebook

In the banking industry, detecting credit card fraud using machine learning is not just a trend; it is a necessity for banks, as they need to put proactive monitoring and fraud prevention mechanisms in place. Machine learning helps these institutions reduce time-consuming manual reviews, costly chargebacks and fees, and denial of legitimate transactions. Suppose you are part of the analytics team working on a fraud detection model and its cost-benefit analysis. You need to develop a machine learning model to detect fraudulent transactions based on the historical transactional data of customers with a pool of merchants.

  • Updated May 21, 2021
  • Jupyter Notebook

Build a model to predict employees involved in Enron fraud case based on email & financial data set. Use feature selection & engineering, algorithm selection, & model selection based on F1 score, precision, & recall.

  • Updated Sep 19, 2017
  • Jupyter Notebook

Final results and Python code for an experiment applying predictive modeling, using LSTMs, to time-series gait data. The "Resampling and Epochs test" shows the results of the first iteration of optimizing model parameters two at a time. "Batch Size and Neurons test" shows the results of the second test, optimizing the remaining two parameters.

  • Updated Sep 26, 2017
  • Python

Forecasting future sales of a product offers many advantages. Predicting future sales of a product helps a company manage the cost of manufacturing and marketing the product. In this notebook, I will try to you through the task of future sales prediction with machine learning using Python.

  • Updated Sep 29, 2022
  • Jupyter Notebook

This is a collection scripts and tools intended to provide a template on how to integrate and apply Scikit-Learn with ArcGIS Pro. The tools distributed enable access to various machine learning algorithms through scripting tools in the geo-learn toolbox. The tools largely work by passing geographic coordinates and related data to be clustered or analyzed to help with spatial analysis tasks, data reduction, or cartography. In addition, the tool sets include regression analysis tools for exploring different Scikit-Learn model's ability to provide predictive analysis.

  • Updated Apr 6, 2020
  • Python

Created by David Cournapeau

Released January 05, 2010

Latest release about 1 month ago

Repository
scikit-learn/scikit-learn
Website
scikit-learn.org
Wikipedia
Wikipedia

Related Topics

python scikit